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I. Preliminaries 

N otation. Let ^ be the set of linear transformations on R°, and suppose 
that 0 n denotes the group of orthogonal transformations on R n . 

Theorem 1 . In the norm operator topology on M Q , 0 Q is a compact topo- 

logical group. [1] 

Theorem 2. If AeO n , then det(A) ■ ± 1. I7l 

Definition 1. The orthogonal transformation A is a rotation , in case 
• det(A) ■ l. Otherwise, A Is called a reflection . Suppose that 
R is the subgroup of 0 Q consisting of all rotations, and let 
R denote Oq^R. 

Theorem 3. In the norm operator topology on M n , 0 n consists of the com- 

A A 

ponents R and R. Hence, R and R are compact, Ill 

Definition 2. Let B denote { x : II x II - 1 }, the set of unit vectors in 

Theorem 4. In the norm topology on R n , the set B is compact and connected 

Proof. Since B is closed and bounded, it is compact. Moreover, B is the 
continuous image of R n 'V. { 0 } and hence is connected. 

Definition 3. For each xeB, let the Householder transformation ^ be 

defined by H^ - I - 2xx T . Let H n denote { Hjj : x e B), the set of all 
Householder transformations. [5] 


Theorem 5. The set H n is a compact, connected subset of R. 

Proof . By the continuity of matrix operations, W n is the continuous 


image of B under the mapping x -*■ I - 2xx^. Thus is compact and 

connected. Choose x e B. Since (I-2xx^)^ “ I - 2xx^ and 

(I-2xx T )2 ■ I, it follows that I - 2xx T e 0 U . Also, 

det(I-2xx T ) • -1 implies that I - 2xx T e 8. 

Theorem 6. (Householder) If y e R P 9} and x C B, then there exists 
a vector w e B such that (I* £wv •) y ■ II y llx. [4] 

Proof . In the case where y • II y llx, choose any w satisfying < f,y> ■ 0. 

y - II y llx 

Otherwise, let w be defined by w » — ’ 

||y - II y llx!* 

Corollary 7. For each x,y e B, there exists some w c B satisfying 

^(y) - s. 

It has been shown by Decell in 121, for optimal selection of linear 
combinations (feature selection), that the search for an optimal solution 
(k*n, rank k matrix B) may be restricted to the set of k*n matrices of 
the form B - (I,JZ)U, where U is an n*n orthogonal matrix. 

H. Walker has shown that, given an optimal linear transformation (I k |Z)U, 
there exists a positive integer p < minfk,n-k) such that (I^|Z)U may 
be factored into the product (T^ j Z) * • - Hj , for some Hp...,Hp e H a . 

Note that Theorems 8 and 10, with their corollaries, in addition to 
establishing the existence of the p < nin{k,n-k} factors Hj,***Hp , 
yield Walker's result for a very particular sequence of transformations in 
H Q (i.e., those derived by Householder's technique for upper triangularization 


These remarks apply to all separability criteria which are Invariant 
under nonsingular transformation (e.g., probability of olsclasslf icatlon, 
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Divergence, Bhittacharyya distance, Chernoff distance. Transformed Divergence). 
This discussion will be summarized In Theorem 12. 

Theorem 8. Let {ep...,e Q } denote the usual orthonormal basis for R n . 

Suppose that {uj,...,ii q } is any orthonormal set of vectors. Then 

there exist transformations H.,...,H , E H such that: 

1 n-1 n 

(1) if 1 < n-1, then ^•••HjUj - e^ for all J < 1 

(2) In addition. If 1 • n-1, H n _j** , HjU n " ie Q * 

Proof . If 1 ■ 1, by Theorem 6 there exists a transformation Hj e 


such 

that 

Hjuj - e p 

Let 

P < 

n- . Suppose 

thit Hj | • • • *Hp 


have 

been 

chosen such 

that 

if 

i < p, then 

H^* • •Hjiij ■ ej 

for 


all J < 1. Let the vector u ■ Up* • ‘HjUp+i , and suppose that u 
denotes the vector in R n_p which consists of the last n-p 
components of u. Likewise, let e^ consist of the last n-p 
components of ep+j. Since is an Isometry for each 1 • l,...,p, 

we have II u II ■ 1 . It follows from the relations <u,ej> • 0, 

J - l,...,p, that u is a unit vector in R n-p . Again using Theorem 6, 
choose ii ■ I - 2x x^ E H si*' h that Hu ■ e. E R n p . Define 


n— p n-p n-p n-p 



where each occurrence of Z denotes the expropriate matrix or vector 
of zeros. It is evident that if i < p+1, then for all J < 1, 
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H . H • • *H,u, - H e, 

P+1 P 1 J P+1 J 


ej ■ ej . Therefore, by 


induction, (1) holds for .ill 1 ■ l,...,n-l. Finally, given the 

Householder transformations H,,...,H . constructed above, observe 

l n-i 

that (2) follows from II H •••Hu I! • 1 and the relations 

n-l 1 n 

<H n _ l “ , H 1 u n ,ej> - 0, J - 1, . . . ,n-l . 

Corollary 9. If U is an orthogonal matrix, then there exist transformations 

.. e H_ such that: 

* n - 1 u 


(1) H •••H,U T - ‘ n ~ l 

n-l 1 


±1 


( 2 ) 


Vi 

z 

z 



l H n-f H l U " 1 


(3) U 1 


Vi 

z \ 

z 

1 



H n-1*“ H 1 * 1 


(4) for p - l,...,n-2, there exists a unit vector x n 
such that 


n-p 


Vi " 1-2 


(i) •*•’ • 


*n-p * x n-p X n-p 


(5) exactly one of the following holds: 

a) - I 

b) H n H n _i'*-H 1 U T - 1. , where 


K n " 


ln-1 

z 

Z 

-1 


■ I - 2e e"[ e H 
an n 


(6) if (5) a) holds, then U - (U T )" 1 - H 

n-l l 

and if (5) b) holds, then U - (U 1 )' 1 - H H 

n n - 1 i 


(7) (I k |«H n H n _r**H l - (I k |2)H k ** H 1 
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Proof . Since U Is an orthogonal matrix, we also have U* C (? n . Thus the 
columns of l) T form an orthouormal set of vectors {U|,..,,u Q }. Choose 
transformations c H Q satisfying (1) and (2) of Theorem 8. 

Because U • (uj|***|u n ), parts (1) through (6) are lusncdlate 
consequences of this theorem. Part (7) follows from the observation 
that 

OjziH^ - (I k |Z) 
for all J • l,...,n-k. 


One should observe that Theorem 8 may be restated In the following 

form. 


Theorem 10. 

Let 

{u^, . • . ,U Q } 

be an orthonormal set 

of 

vectors in R . Then 

there exist 

H 1 H n-1 

C H such that 
n 



(1) 

If 

1 < r-1. 

then H^* • "HjUj ■ e^ 

for 

all n-fl-1 < j < n 

(2) 

H 

n- 

■ 

a 

X 

a 

• 

a 

*V 




Corollary 11. 11 U is an orthogonal matrix, then there exist transformations 

. C H n such that: 

1 n- 1 n 


( 1 ) 


H 


n-1 




( 2 ) 

(3) 



n-p 


c R 


n-p 


(A) 


for p - l,...,n-2, there exists a unit vector x 
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(5) exactly one of the following holds: 

a) H " I 

o- 1 1 

b) H nVr ,,H l ljl " 1 • where 



(6) If (5) a) holds, theo U - H • ••H. 

n- 1 1 

if (5) b) holds, then 0 - • 'Hj 

(7) H, - A<I k |Z)H n . k -"B, . 
for some nonsingular k*k matrix A. 


Proof . Parts (1) through (6) of this corollary follow directly from 
Theorem 10. Part (7) follow* bv observing that for p ■ n-k+1, 


« k l»H p 


o.|z) Ab~ 2i y i l 


• <I k* 2 Vk )(I lJ Z) 


n-k 


and, for p >n-k+l and (n-p) + r ■ k. 


(I k |Z)H - 


An-p-’Vp-n-P 

z 

z 

- (I, Z) z 

n 

z 

k V i 

z 

Is 


r I -2x x T 
n-p Vp x n-rt 


(i k |z). 


Theorem 12. Let be any real-valued separability criterion which Is 

in/ariant under nonsingular transformation. If B is a rank k, k*n 
i^-optlmal solution of the feature selection problem, then there exist 
at most m ■ min {k, n-k} Householder transformations such that 


la ijf-optlaal (l.e., 
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' , ( I k |Z) H Bl ...H 1 “ V* 

P roof . Recall that B » (IjJz)U, for some orthogonal matrix U. The proof 
then conalats of selecting a to be the smaller of V and n-P, and 
aubaequently applying either Corollary 9 or Corollary 11. 


II. Separability Criteria and Suggested Algorithms for Opt lmal 
(Suboptlmal ) Line ar Co mbinations 


Let <1* be any continuous real function of the matrix variable (I k |Z)H. 

(I k |Z)H ! ^n there Is an Hj e such 


Since H q Is compact and ip 
that 


leC • 


Theorem 13. For each positive Integer 1, let the element of H n be 
chosen such that 


4 > 




It follows that, for each 1, 


(l> 4 '(I k |Z)H l ...H 1 5 ^(I k |Z)H 1+1 H 1 ...H 1 

(2) < '(I k |Z)H 1 ...H 1 a " ♦(X k |Z)H 1 + 1 H 1 ...H 1 f ° r ^ HEh n 
(3> *(I k |Z)HH t ...H 1 “ ' , '(I k l Z )H 1+1 - H i ...H 1 every H e H n 

for every H e 
and p ■ 0 ( ... ,1-2. 

Proof. As in the proof of Corollary 9, we may choose HeH n such that 

(I. | Z)H ■ (I, I Z) and use the definition of ^ . 

* * 't k | Z)Hi^.iH^, . .Hi 

conclude that 


to 


^(I k |z)H 1 ...H 1 “ ,| '(I k |Z)HH 1 ...H 1 5 ' i ’(I k |Z)H 1+1 H i ...H 1 * 


9 


Now let HcH n and define H ■ H^( )H^. By 
Theorem 10, Inductively conclude that RcH^, and therefore that 


*(I k |Z)HH 1 ...H 1 5 4 '(I k |z)H 1+1 H l ...H 1 * 
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However, 

HH 1 ...H - H 1 . 

H 1 ...H 1 - H 1 H 1 _ 1 ...H 1 H, so 

(2) 

♦ 

follows. 

Clearly, (3) 

holds by the definition of In 

order to 


conclude 

(4) , suppose 

0 < p < i-2 and HcN . For this case. 

define 


H « Hh. , ...... H,. 

1 t p i-(p+l) 1 


Since ReH , statement (4) follows from (2). 
n 


Theorem 14. If the hypothesis of Theorem 13 is satisfied and the sequence 


“ b0anM ab ° V *- 


then 


‘ “ 8 po,Ulve lnte * er l- 


Remark. It is clear that Theorems 13 and 14 remain true If "l.u.b." is 

replaced by "g.l.b.", "bounded above" Is replaced by "bounded below", 
and the inequalities are reversed. 

Note that by definition. Divergence, Bhattacharyya distance, 
probability of misclassif ication, etc., all satisfy the hypothesis of Theorems 
12 and 13. These theore-ns give rise to a sequential monotone procedure for 
possibly obtaining a ({/-extremal rank k linear combination matrix. At each 
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atage in Che procedure, Che extremal problem la a function of only n variables. 
AC Chia point, we conjecCure that Che procaaa ahould terminate In at most 
aln{k,n-k} steps. This conjecture la clearly In line with the mln{k,n-k) 
representation of the actual ^-extremal solution. In addition, all teats 
of the algorithm on real data further lead one to believe that the conjecture 
la fact. It is evident that this procedure la at worst sub-optimal. 

S. Harani, In [6], gives details of computational results obtained 

using tne sequential procedure and a very crude differential correction scheme 

to solve the n-d linens Iona 1 variational problem for the generators of the 

at each stage of the procedure. The initial guess used at each stage of the 

process was arbitrarily set at X • -ji-,..., 

/n /n vn 

Even using this rough initial guess and the differential correction 
scheme , convergence seems to be fairly rapid. Moreover, one would 
reasonably expect to reduce iteration time using an Improved scheme for 
successive initial guesses at each stage of the procedure, together with a 
more sophisticated iteration procedure. 

The results obtained by Marani, despite use of the arbitrary Initial 
guess at each of the stages, match the results of known Divergence-optimal 
solutions calculated by J. Quirein in [2). Note that the total number of 
scalar variables involved in this process cannot exceed 

V * n + (n-1) + (n-2) + ... + (n-(m-l)) 

- nm - m-1) , 


where m ■ mln{k,n-k}. 
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In (3), Decell and Mayekar have developed an analytical 
v'xpreaalon. In the case ^■Divergence, for the variational equations as a 
function of the Householder generators for every H^ . This expression Is 
utilized in Marini's calculations. Ue should further point out that these 
results only depend on 
# 1* The continuity of tjs 

2. The compactness of H Q 

3. The invariance of ip under nonsingular transformation. 

The following are several related open questions: 

1. Does the process terminate in at most mln{k,n-k} steps? 

I. D ' t terminate at an absolute ^extremum (rank-k 
maximal statistic)? 

Given ip, and , under what conditions is a 'J' -extremal 

12 1 

solution also a ^ 2 ~ cxtrema ^ solution? 

4. If the process does not terminate In a finite number of steps, 
what can be said of the cluster points of the sequence 

{H^. . (Recall that 0 n is compact). 
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III. Several Itsef ul Theo rems 

Theorem 15. Suppose that H x e H R . Then u H Jt U T c W n , with Ux a 

T 

generator of UH^ll , for all U e In particular, for every 

Hy c H a , Hyl^Hy e H q . In this case, the Householder transformation 
HyH x Hy is generated by the vector x-2<x,y>y. 

Proof . Suppose U e 0 Q . Since II Ux II ■ II x II ■ 1, it follows that 

tnyi 1 - U(I - 2 xx t )U t - I - 2(Ux) (Ux) T £ H n . If U is chosen to be some 
Hy e H Q , then t! e generator of H y H x H y is 
HyX ■ (I-2yy T )x • x - 2y(y T x) - x-2<x,y>y. 

Theorem 1 6. For each H x and Hy in H Q , there exists some H E H Q such that 
HH y H - H x . 

Proof . By Corollary 7, there exists a transformation H £ satisfying 
Hy • x. Therefore 

HHyH - H(I - 2yy T )H - H 2 - 2Hyy T H - H 2 - 2(Hy) (Hy) T - I - 2xx T - H^. 
Theorem 17 . Suppose that 1^, Hy E H Q . Then the following statements are 
equivalent: 

(1) <x,y> - 0 

(2) x 4 i y and H^Hy - HyH, ■ H^ + Hy - I. 

Proof. Le.. <x,y> - 0. Then cleaily x - ± y. Moreover, 

H x H y - (I - 2xx T ) (I - 2yy T ) » I - 2xx T - 2yy T ■ I - 2xx T + I - 2yy T - I 

- H X + Hy - I - Hy + H X - I - HyH X . 

Conversely, suppose H x H y “ H y^x* Th en 

1 • My - Vx • 4<, “ T yy T - yy T xx^) . Using the fact that 

<x,y> ■ x T y - y T x, we obtain 4<x,y>(xy T - yx T ) ■ Z. Since x * t y 

implies xy T - yx r + Z, it follows that <x,y> ■ 0. 
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Theorem 18. Suppose H^, Hy e H Q . Then the following are 

(1) <*.y> - * i 

(2) x - ± y 

(3) H.-Y 

Proof. Let H x - H y . Then xx T - yy T , and hence x(x T x) - 
II x I! 2 ■ 1, it follows that x - (y T x)y. Finally, 

II x II ■ I y T x I • ll y II implies that l<x,v>l • ly T xl - 
parts of the proof are Immediate. 


equivalent : 


y(y T x). Since 


1. The remaining 


1A 


REFERENCES 


1. Chevalley, Claude, Theory of Lie Groups , Princeton University 
Press, Princeton, 19A6. 

2. Decell, H. P. Jr. and Qulrein, J. A., "An Iterative Approach 
to the Feature Selection Proolem", Report #26 NAS-9-12777, 
Department of Mathematics, University of Houston, March 1973. 

3. Decell, Henry P. Jr. and Mayekar, Mik«-, "On the Variational 
Equations for Householder Transformations in Feature Selection", 
Report #39 NAS-9-12777, Department of Mathematics, University 

of Houston. 

A. Householder, Alston S., "Unitary Trlangularlzation of a Non- 
8ymmetric Matrix", J. Assoc. Comput. Mach., 5(1958), 339 - 3A2. 

5. Householder, Alston S., The Theory of Matrices in Numerical 
Analysi s , Blalsdoll, New York, 196A. 

6. Marani, Salma, "Routine to Find the Maximum Average Divergence", 
Report #36 NAS-9-12777, Department of Mathematics, University 

of Houston, August 197A. 

7. Weyl, Hermann, The Classical Groups , Second Edition, Princeton 
University Press, Princeton, 19A6. 


